Exploiting Shared Memory to Improve Parallel I/O Performance

نویسندگان

Andrew B. Hastings

Alok N. Choudhary

چکیده

We explore several methods utilizing system-wide shared memory to improve the performance of MPI-IO, particularly for noncontiguous file access. We introduce an abstraction called the datatype iterator that permits efficient, dynamic generation of (offset, length) pairs for a given MPI derived datatype. Combining datatype iterators with overlapped I/O and computation, we demonstrate how a shared memory MPI implementation can utilize more than 90% of the available disk bandwidth (in some cases representing a 5× performance improvement over existing methods) even for extreme cases of non-contiguous datatypes. We generalize our results to suggest possible parallel I/O performance improvements on systems without global shared memory.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Specification and Performance Evaluation of Parallel I/O Interfaces for OpenMP

One of the most severe performance limitations of parallel applications today stems from the performance of I/O operations. Numerous projects have shown, that parallel I/O in combination with parallel file systems can significantly improve the performance of I/O operations. However, as of today there is no support for parallel I/O operations for applications using shared-memory programming mode...

متن کامل

Development and Evaluation of High-Performance Decorrelation Algorithms for the Nonalternating 3D Wavelet Transform

We introduce and evaluate the implementations of three parallel video-sequences decorrelation algorithms. The proposed algorithms are based on the nonalternating classic three-dimensional wavelet transform (3D-WT). The parallel implementations of the algorithms are developed and tested on a shared memory system, an SGI origin 3800 supercomputer making use of a messagepassing paradigm. We evalua...

متن کامل

Efficient Machine-Independent Programming of High-Performance Multiprocessors

A major component of the success of scientific computing is the rapid increase in computing capability. Parallel computing can provide the next great leap in the computation power scientists and engineers need to solve many important problems. The proliferation of parallel architectures, however, discourages users from writing parallel applications. Recent advances in automatic parallelization ...

متن کامل

A comprehensive distributed shared memory system that is easy to use and program

An analysis of the distributed shared memory (DSM) work carried out by other researchers shows that it has been able to improve the performance of applications, at the expense of ease of programming and use. Many implementations require application programmers to write code to explicitly associate shared variables with synchronization variables or to label the variables according to their acces...

متن کامل

cient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures ?

This paper presents a new parallelization method for an efcient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized witho...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Exploiting Shared Memory to Improve Parallel I/O Performance

نویسندگان

چکیده

منابع مشابه

Specification and Performance Evaluation of Parallel I/O Interfaces for OpenMP

Development and Evaluation of High-Performance Decorrelation Algorithms for the Nonalternating 3D Wavelet Transform

Efficient Machine-Independent Programming of High-Performance Multiprocessors

A comprehensive distributed shared memory system that is easy to use and program

cient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures ?

عنوان ژورنال:

اشتراک گذاری